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This paper investigates multistep prediction errors for non- stationary autoregressive processes 
with both model order and true parameters unknown. We give asymptotic expressions for the 
multistep mean squared prediction errors and accumulated prediction errors of two important 
methods, plug-in and direct prediction. These expressions not only characterize how the predic- 
tion errors are influenced by the model orders, prediction methods, values of parameters and 
unit roots, but also inspire us to construct some new predictor selection criteria that can ulti- 
mately choose the best combination of the model order and prediction method with probability 
1. Finally, simulation analysis confirms the satisfactory finite sample performance of the newly 
proposed criteria. 
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model selection; plug-in method 

1. Introduction 

Forecasting theory for stationary series with known true parameters is well studied but 
not much is known about the case for non-stationary models with estimated parameters. 
To fill the gap, this paper investigates multistep prediction errors for autoregressive (AR) 
processes with unit root. The plug-in and direct predictors are the two most frequently 
used multistep prediction methods and comparing their relative performance has become 
a major issue in forecast theory. In the case of squared error losses, the plug-in predictor 
is obtained from repeatedly using the fitted (by least squares) AR model with an un- 
known future value replaced by their own forecasts and the direct predictor is obtained by 
estimating the coefficient vector in the associated multistep prediction formula directly 
by linear least squares (see (1.2) and (1.3) below). Recently, many informative guide- 
lines have been proposed to choose between these two methods in various time series 
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models; see Findley [5, 6], Tiao and Tsay [20], Lin and Tsay [16], Ing [9, 10], Chevillon 
and Hendry [3] and Lin and Wei [17], among many others. However, a theoretical reso- 
lution to the problem of how to select the optimal multistep predictor in non-stationary 
time series still seems to be lacking, at least when the estimation uncertainty is taken 
into account. In this paper, we have developed and rigorously analyzed the theoretical 
properties of some predictor selection criteria to choose the model order and prediction 
method simultaneously. 

Assume that observations x\, . . .,x n are generated from a unit root AR model, 

p+i 

xt+i = J~] cgxt+i-i + Et+i, (1.1) 

i=l 

where < p < oo is unknown, a p +i =^ 0, et's are white noises with zero means and common 
variance a 2 and the characteristic polynomial 

A(z) = 1 — a\z — ••■ — a p z p — a p+ iz p+1 
= (1 — z){\ — a.\z — ■ ■ ■ — a p z p ), 

with a(z) = (1 — ctiz — ■ ■ ■ — a p z p ) ^ for all \z\ < 1. Xt is called stationary or stable if all 
roots of A are outside the unit circle and unstable or non-stationary if some roots of A 
are on the unit circle. For the sake of convenience, the initial conditions are set to x t — 
for all t < 0. To predict x n +h,h> 1, based on x\,...,x n and a working model AR(fc), 
one may use the plug-in predictor, i„ + ^(fc), or direct predictor, x n +^(fc), where 

x n+ h(k)=x' n (k)a n (h,k), (1.2) 

and 

x n +h{k) =x' n (k)a n (h,k), (1.3) 

with Xj (fc) = (xj , . . . , Xj-k+i)' being the regressor vector and a„ (h, k) and a n (h, k) being 
plug-in and direct estimators, respectively. Note that 

{i-l ~) i-l 

^Xj(fe)x<(fc) U,(l,fc) = Y^,Xj(k)x j+ i, 
j=k ) j=k 

{i — h \ i—h 

^]x J (fc)x^(fc) \ai(h,k) = Y^j(k)xj +h , 
J= k ) j=k 

and a i (h,k)=A^- 1 (k)a i (l,k), with A°(k)=I k , 

Ik-l 



Ai{k)= 
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and I m and O m , respectively, denoting an identity matrix and a vector of zeros of dimen- 
sion m. To assess the prediction performance of x n +h{k) and x n +h(k), we consider their 
mean squared prediction errors (MSPEs), 

MSPEP nJl (k) = E(x n+h - x n+h {k)f 

and 

MSPED n , h (fc) = E(x n+h - x n+h (k)f. 

Theoretical investigations of MSPEP„ ,h(k) (or MSPED rlj / 1 (fc)) in non-stationary AR 
models date back at least to Fuller and Hasza [7] . When k > p + 1 , an argument similar 
to that used in their Theorem 3.1 yields the following asymptotic expressions: 

MSPEP„,h(fc) = a\ + E{R P , n (k)} (1.4) 

and 

MSPEAu^fc) = °\ + E{R D , n (k)}, (1.5) 

where i?p,„(fc) = O p (7i _1 ), R D ,n(k) = O p (ti _1 ) and a\ = E(r] 2 h ), with r]t,h = J2jZo bj£t+h-j, 

bj = X)i=o Ci ' c o — 1 an d Cj,j > 1, satisfying 1 + X)j=i c i 7 ^ = ^/oi{z) (note that a(z) is 
defined after (1.1)). The first term on the right-hand sides of (1.4) and (1.5), originating 
from the random disturbances {e*}, is common for each multistep predictor, whereas the 
second terms on the right-hand sides of (1.4) and (1.5), arising from the estimation un- 
certainty, can vary with different fc, different prediction methods and different parameter 
values. However, since only rates of convergence of the second terms are reported, (1.4) 
and (1.5) fail to depict these features, which are indispensable in performing predictor 
comparisons. To remedy this difficulty, the constants associated with the terms of order 
n in E{Rp, n (k)} and E{Ro : n{k)} need to be characterized. Recently, Ing [8] made 
a first step toward this goal. In the special case where p = in (1.1) (the random walk 
model) and k = h = 1 , he showed that 

lim n(MSPE J R„,i(l)-cr 2 )= lim e{— n 2 (a„(l, 1) - 1) 2 1 = 2a 2 . (1.6) 

The main obstacle in dealing with the above expectation, as argued by Ing, is the fact 
that the square of the normalized regressor, x^/n, and the square of the normalized 
estimator, n 2 (a n (l, 1) — l) 2 , are not asymptotically independent - a situation somewhat 
different from that encountered in the stationary case. While Ing was able to overcome 
this difficulty, his approach, focusing only on the random walk model and the case of 
one-step-ahead prediction, cannot be directly applied to more general non-stationary 
AR models or multistep prediction cases. 

Another subtle problem, related to the direct method, can be illustrated using the 
following special case of (1.1): 



(1 - B)(l + 0.1B + 0.91B 2 )x t+ i = (1 - 0.95 + 0.81B 2 - 0.91B 3 )a; t+ i = e t+ i, (1.7) 
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where B is the back shift operator. Simple algebra yields 



x t +i = 0.181z t _ 2 + 0.819a; t _3 + s t+ i + 0.9e t . 



(1.8) 



As observed in (1.8), the direct method only requires two regressors to make a three-step- 
ahead prediction, which indicates the interesting fact that the minimal correct order for 
the direct method, determined by the prediction lead time and unknown parameters, can 
be strictly less than that for the plug-in method. In general, model (1.1) can be rewritten 

as 



and A°(k) = I k . Let a(h,p+ 1) = (ai(h,p+l), . ..,a p+1 (h,p+ 1))' = A h - 1 {p+ l)a(p+ 1). 
The above example leads us to dchne the minimal correct order for the /i-step direct 
method, ph — max{j : 1 < j <p + 1, a,j(h,p+ 1) 7^ 0}. As will be seen in Section 2 below, 
comparison results between the plug-in and direct predictors are very complicated in 
situations where ph < Pi ■ 

In Section 2, we first derive asymptotic expressions for MSPEP ni /j(fci) and MSPED^^) 
up to terms of order n -1 , where ki > p\ and k 2 >Ph- The constants associated with the 
terms of order n _1 in these expressions characterize how the prediction error is influenced 
by the orders, methods (plug- in or direct), values of parameters and even the unit roots. 
Based on these expressions, a series of examples (Examples 1-3) is given to illuminate 
that to find the asymptotically optimal (from the MSPE point of view) multistep pre- 
dictor among candidate plug-in and direct predictors, prediction orders and prediction 
methods must simultaneously be taken into account. The traditional order selection cri- 
teria can no longer serve that purpose. Section 3 is devoted to alleviating this difficulty 
Our strategy is to find a statistic for each MSPEP„^(fc) and MSPED ra> /j(fc), k = 1, . . . ,K 
and show that the ordering of these statistics coincides with the ordering of their cor- 
responding multistep MSPEs. Here, K > p\ is a known integer. In view of Ing [10], 
the statistics adopted in this section are the multistep generalizations of accumulated 
prediction errors (APEs) based on sequential plug-in and direct predictors, namely, 



x t +h = (A h 1 {p + l)a(p + l))'xt(p + 1) + T) tth 



h>l, 



where a(fc) = (01, . . . , a^)', with aj — for j > p + 1, 




n — h 




(1.9) 



i=m h 



and 



n—h 




(1.10) 



i=m h 
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where rrih denotes the smallest positive number such that ki(h,K) and an(h,K) are well 
defined for all i > m^. Note that APEP ri! i was first proposed by Rissanen [19]. A complete 
asymptotic analysis of APEP„ i was given by Wei [21, 22] under a model more general 
than (1.1). However, due to some "nice" properties in APEP nj i that are missing in its 
multistcp counterparts (see Remarks 2 and 3 in Section 3), the asymptotic analysis of 
(1.9) and (1.10) in non-stationary AR processes is still lacking. We propose a resolution 
to this problem, which shows that every APEP„ i ? l (/ci) and APEP„ j / 1 (/c2), with ki >pi 
and can be asymptotically decomposed into two terms; one of which, due to 

estimation uncertainty, is of order logn, and the other, due to the random disturbances, 
is of order n and common for each predictor. More important, the constant associated 
with the term of logn in APEP„ (APEP/„./i) is exactly the same as the one associated 
with the term of n _1 in its corresponding MSPEP n ,/j (MSPED„.^). This special feature 
enables us to show that Ing's [10] asymptotically efficient predictor selection procedure 
(based on APEP^/j and APEP„.ft,) in stationary AR processes can carry over to non- 
stationary cases and hence leads to a unified approach. Note that a predictor selection 
procedure is said to be asymptotically efficient if, with probability 1, it can choose the 
order/method combination with the minimal MSPE for all sufficiently large n; see Section 
3 for the exact definition. 

Despite its theoretical advantage, Ing's procedure suffers from unsatisfactory finite- 
sample performance, as explained at the beginning of Section 4. To fix this flaw, a new 
predictor selection method is proposed in Section 4. This new method not only shares 
the same asymptotic advantage as Ing's procedure, it also has satisfactory finite-sample 
performance, which is illustrated at the end of Section 4 through a simulation experiment. 
Appendices A-C contain the proofs of the theorems in Sections 2-4, respectively. 

2. MSPEs of plug-in and direct predictors in the 
presence of unit roots 

Throughout this section, it is assumed that in model (1.1) the et's are independent ran- 
dom variables with zero means and variances a 2 > 0. Moreover, there are small positive 
numbers a\ and 8\ and a large positive number M\ such that for < s — v <5\ 



where v m = (vi, . . . , v m )' € R m , ||v, n j| 2 = 5^/=i Vj and Ft,m,v m (') denotes the distribution 

In the case, where £ t 's are i.i.d., the following lemma provides sufficient conditions 
under which (2.1) is fulfilled. The proof of this lemma can be found in Ing and Sin [12]. 

Lemma 2.1. Let Et's be i.i.d. random variables satisfying E{e\) = 0,P(ef) > 0, and 
E(\si\ a ) < oo for some a>2. Assume also that for some positive constant Mi < oo, 



sup 

l<m<i<oo, || v m || — 1 



\Ft 



t , rn , v 



(*)--Fi,m,v m M|<Mi(a-i/) 



(2.1) 




(2.2) 
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where tp(t) = £'{exp(itei)} is the characteristic function ojs\. Then, for all — oo < t < oo, 
m > 1, r m € R m and ||r m || = 1, £/iere zs a finite positive constant M3 such that 

Sup ft,m,r m (x) < M 3 , 
— oo<a:<oo 

where /t,m,r m (') *s </ie density function of (e t , . . . , e t+ i_ m )r m . ^4s a result, (2.1) follows. 

Since (2.2) is satisfied by most absolutely continuous distributions, (2.1) is flexible 
enough to accommodate a wide range time series applications. Note that (2.1) is given 
to ensure that the inverses of the normalized Fisher information matrices, i?~ 1 (fc) and 
R~ h {k), have finite positive moments in the senses of (A.l) and (A. 19) (in Appendix A), 
where 

n— 1 

R n (k) = -D n (k)J2^(kmk)D n (k)' 

j=k 

and 

^ n—h 

Rn.h{k) - -D n (k)J2 x A k H( k )D n (k)', 



•' \ 



1 -1 
-Qfc-1 

■■ ' 

a,j = for j > p and D n {k) equal to D n {k) with a, replaced by for i = l,...,fc — 1. 
These results will be used to deal with the asymptotic properties of MSPEP/ l: „ and 
MSPED/j in ; see the proofs of Theorems 2.2 and 2.3 for details. Theorems 2.2 below 
provides an asymptotic expression for MSPEP ni h,(fc) with k > p\. Before stating the 
result, we need to define S^.j(k) = Ik and with a(k) = (a%, . . . , otk)' , 

S M (k) = (a(k) 

Theorem 2.2. Assume that {xt} satisfies model (1-1). Also assume that {e t } satisfies 
(2.1) and 



3=k 



with 



D n (k) = 



( 1 




-1 
1 -1 







-ax 



in wn 




E(\e 1 \ 8h )<(x, 
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where 8^ = max{8, 2{h + 2)} + S for some 6 > 0. Then, for k > p\ and h>l, 



i (MSPEP„. /i (fc) - a\) = 2a 2 ^ b^j 



2 

+ / M (fc-l) + o(l), (2.3) 



where /i.^(0) = and for k>2, 

fi,h(k - 1) = tr(T(A - l)M h (fc - l)r- x (fc - l)M' h {k - l))a 2 , 

with M h (k-1) = Ei=o ^-SrJr W (*-l). r ( fc - 1) = lim^oo E( Sj (k- l)s$(fc- 1)), s,(fc- 
1) = (Sj, . . . , Sj-fc+2)' a«c? Sj = - Xj-i- 

An asymptotic expression for MSPEDjj^fc), with k >ph, is given as follows: 

Theorem 2.3. Let the assumptions of Theorem 2.2 hold, with 9h replaced by 8 + d for 
some 5 > 0. Then, for k>ph and h>l, 

n(MSPVD nM (k)-a 2 ) = 2a 2 (^T^ + f 2<h {k - 1) + o(l), (2.4) 

where / 2 ,h(0) = 0, /or fc > 2, 

/2,h(fc- 1) =tr|r" 1 (fc- 1) lhn cov^fys t+i (*! - 1)^ ja 2 , 

and /or random vector y , cov(y) = -E{(y — E(y))(y — E(y))'}. 

Theorems 2.2 and 2.3 show that each n(MSPEP„,/ l (fci) - of) and n(MSPEfJ>„^(fc 2 ) - of), 
with fci > pi and &2 > p^, can be asymptotically decomposed as a sum of two terms. The 
first term, 2a 2 (%2j = o bj) 2 , arising from predicting the non-stationary component in model 
(1.1), is common for each predictor, whereas the second term, fi h(k— 1) (or f2,h(k— 1)), 
arising from predicting the stationary component in model (1.1), can vary with different 
orders and methods. The following examples help provide a better understanding of 
Theorems 2.2 and 2.3. 

Example 1. When k > max{2,pi} and h = 2, by (2.3) and (2.4), it is straightforward 
to show that 

hMk - 1) ={(k - 2) + al_ t + 2ai 0l + b 2 (k - l)}a 2 (2.5) 

and 

/ 2 , 2 (fc - 1) = {(fc - 1)(1 + b\) + 2a 1 b 1 }a 2 1 (2.6) 
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which yields 

/ 2 , 2 (fc - 1) - fi,2(k - 1) = (1 - c? k _ x )o 2 > 0. (2.7) 

Moreover, by an argument similar to that used to prove (17) of Ing [9], it can be shown 
that for k > max{2,pi} and h>2, 

h,h{k - 1) - h lh {k - 1) > h fi {k - 1) - fi, 2 (k - 1) > 0, (2.8) 

and hence x n +h{k) is asymptotically more efficient than i„ + / t (fc) in this case. 

As shown in Section 1, it is possible that ph < Pi- In this case, it would be more 
interesting to compare n(MSP~EP n j l (pi) — a 2 ) and n(MSPE£) ni /i(pft) — cr^) rather than 
those MSPEs of the same order. The following example shows that the advantage of the 
plug-in predictor illustrated in Example 1 vanishes in this kind of comparison. 

Example 2. Assume 

(1 - B)(l +axB + --- + a p B p )x t = e t , 

where p > 2, 1 + a\Z + ■ ■ ■ + a p z p ^ for \z\ < 1 and a p ^ {). If a\ = 1, then it is not 
difficult to see that pi = pi — 1 = p and ji,2{p) — S-2.i{p — 1) = cr 2 . In addition, (2.7) 
implies h,i(v) ~ A,2(p) = (1 - a p)v 2 ■ As a result, 

n{MSPEP n , 2 { Pl ) - a 2 } - n{MSPED n , 2 {p 2 ) - a 2 } a 2 p a 2 > 0, 

as n — ► oo. Hence i n +2(p2) is asymptotically more efficient than x n +2{pi) in this case. 

When h = 2 and pi > 2, Examples 1 and 2 together suggest a simple rule that i n +2(pi) 
is asymptotically more efficient than x„+2(p2) if Pi =Pi\ and the conclusion is reversed if 
Pi > Pi- This rule, however, fails to hold for h > 3, as detailed in the following example. 

Example 3. Consider the following AR(4) model 

(1- B){l + a 1 B){l + a 2 B 2 )xt 





= {l-(l-ai)B-(oi- 


-a 2 )B 2 


-02(1 


- ai )B 3 - 


a x a 2 B^}x t = e t , 




Table 1. 


The values of Diff = / 2 , 3 (2) - 


■/i.s(3) 










<u 


0.1 0.2 0.3 


0.4 


()..-) 


0.6 


0.7 0.8 


0.9 


Diff 


0.378 -0.013 0.197 


0.310 


0.354 


0.336 


0.247 0.051 


-0.321 
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where < a\ < 1 and 0,2 = a\ — a\ + 1. It is straightforward to show that p$ = 
3 = pi — 1. By numerical calculations, we obtain the values of /2,3(2) — /i,3(3), with 
a\ = 0.1, 0.2, . . . , 0.9; see Table 1. According to Tabic 1, x n+ s{p^,) is asymptotically more 
efficient than x n +3(pi) in cases of a\ = 0.1, 0.2, 0.9, and less efficient than x n +3(pi) hi all 
other cases. 

Consequently, when h > 3, the rankings of and x n +h(j>h) are determined not 

only by whether ph < Pi, but also by the values of the unknown parameters. Simply 
determining p\ or ph through certain consistent model selection techniques cannot guar- 
antee optimal multistep prediction (from the MSPE point of view) in situations, where 
plug-in and direct predictors are simultaneously taken into account. This phenomenon 
was first reported by Ing [10] in stationary AR models. The above three examples show 
that the same difficulty occurs in the presence of unit root. In the next two sections, 
some proposals toward resolving this problem are given. 



3. Multistep accumulated prediction errors 

Let x n +h(k),k — 1,...,K and x n +h(k),k = 1,...,K, be candidate plug-in and direct 
predictors, where h>l and K>p\. For convenience, we use (k,l) to denote x n+ h{k) 
and (fc,2) to denote x n +h(k). In response to the difficulty mentioned at the end of the 
previous section, this section attempts to choose the order/method combination having 
the minimal MSPE instead of identifying p\ or ph- To this end, the loss functions of (k, 1) 
and (k,2) are defined to be 

/ lim n(MSPEP Blfc (fc) - a 2 h ), ii P i<k<K, 

Ll t h(k) = < n—oc [6.L) 

I oo, if k < pi, 

and 

f limn(MSPE£» n)fe (fe)-<7^), ii Ph <k<K, 
L 2 ,h(k) = \ n-oo (3.2) 
loo, if k<p h , 

respectively. Note that the existence of the above limits is ensured by Theorems 2.2 and 
2.3; and in order to have the prediction loss due to undcrspecification be much larger 
than the one due to overspecification, the loss function values of (k, 1) with k <pi and 
(fc, 2) with k < ph are set to oo. A predictor selection criterion, (k„,j n ), with 1 < k n < K 
and 1 < j n < 2, is said to be asymptotically efficient if 

P{(k n Jn) G G h . K , eventually) = 1, (3.3) 

where 

C h ,K = { (k,j) :1< k < K,l < j <2 and L jth (k)= min L jo , h (k )\. 
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Therefore, with probability 1 {k n ,j n ) can choose the predictor having the minimal loss 
function value for all sufficiently large n. 

The goal of this section is to show that (3.3) is fulfilled by (k n ,j n ). Here, (k n ,j n ), first 
proposed by Ing [10], is obtained through the following procedure: 

Step 1. Define fc' 1 ' = arg min APED nl (k). 

u ' n l<k<K 

Step 2. Define 

fc^ l) „ = arg min APED n . h {k) 

u ' n l<k<K 

and define 

fc^)=arg min APEP jUl (fc). 

k^\<k<K 

Step 3. If APED nth (k ( ^ n ) > APEP n , h (k { n' h) ), then (k n ,j n ) = 0' h \l); otherwise 
{k n ,jn) = {k D n , 2). 

Remark 1. Our analysis below implies that the asymptotic properties of (k n ,j n ) remain 
unchanged if Step 1 is skipped and kn in Step 2 is defined to be arg min APEP n ^{k). 

l<k<K 

In the sequel, the above procedure will be referred to as Procedure I. We begin by 
investigating the asymptotic properties of APEP n: h(k) and APED n: h(k) in the correctly 
specified case. Note that for k>pi, 

n — h 

APEP„ j7l (fc) = {m,h-A{k)Uh{k){^{l,k) - a(fc))} 2 , (3.4) 

i=m h 

where L it h(k) = X^=o h-^i~ X ~ 3 (k); and for k > p h , 

n—h 

APED nih (k)= Y / bn,h-^(kKa l (h : k)-a D (h,k))} 2 : (3.5) 



where a£>(/i, k) = (ai(h,p + 1), . . . ,ak(h,p + 1))', with aj(h,p+ 1), 1 < j <p + 1, defined 
in Section 1 and a 3 (h,p + 1) = if j > p+ 1. 

Theorem 3.1. Assume that {xt} satisfies model (1.1) and {et} is a sequence of inde- 
pendent random noises with zero means and common variance a 2 > 0. Moreover, assume 
sup t i?(|et| a ) < oo for some a > 2. Then, for k>p\ andh>l, 



APEP n , h (k) - Y vlh = l^ 2 (Y, b j] +/i,fc(fc-l)}logn + o(lo gJ 

i=m h (. \j=0 I ) 



^3- 

Lij t (k) logn + o(logn) a.s. 



a.s. 
(3.6) 
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Remark 2. As shown in (B.18), 



n— h n—h 

i=m h i=m h 

+ 0(1) a.s. 

Therefore, the main task of proving (3.6) is to explore the almost sure properties of the 
first term on the right-hand side of the above equality. Through a recursive expression 
for Q n (l,k), where, with Vf 1 (k) = J2l=k x «( fc ) x i( fc )> 

Q„(l, k) = ]T{^(fc)(a„(l, k) - a(fc))} 2 

i—k 

= ( ][X(fc)e*-i j K-i(fc) (j2xi(k) Ei+ A 

\i=k / \i=fc / 

is the (second-order) residual sum of squares for one-step predictions, Lai and Wei [14] 
established a connection between Q„(l,fc) and its sequential counterpart, 

n-l n-l ( n-i \ ~> 2 

^{^(^(Sia,*)-^))}^ x; UiW-^k) r^x i (fc) £i+1 . (3.7) 

i=mi i=m h (. \j=k / ) 

Based on this connection and some strong laws for martingales, Wei [21, 22] subsequently 
obtained an asymptotic expression for the left-hand side of (3.6) in the case of h = 
1. However, it is extremely difficult to obtain an analyzable recursive formula for the 
multistep analog of Q n (l,k), Q n {h,k) = Yn=k {x-(A:)L„j i (fc)(a„(l, k) - a(fc))} 2 , h > 2, 
due to the appearance of L n ^h{k). Hence, Wei's approach is not easily extended to the 
case of multistep predictions. By observing 

Q n (h, k) = hr; ^ k )e i+ A s' n (k)v n . h (k)s n (k) nr 4(^+1 ] , 

\i=k I \i=k / 

where 

(n-h \ /n-l \ _1 

Ing [10], under stationary AR processes, adopted 



Q* n (h,k) = £xj(*)e<+i )S\k)V n - h (k)S(k) £x<(fc)e <+ i 
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to replace Q n (h,k), where S(k) is the almost sure limit of S n (k) that is a non-random 
matrix. He then obtained a recursive formula for fc) and established a connec- 

tion between Q*(/i, fc) and Y^i=k { x 'i(k)Li.h{k)(ai(l, k) — a(fc))} 2 , which further yields 
an asymptotic expression for the latter. Unfortunately, when model (1.1) is assumed, 
S n (k), with k>2, no longer has an almost sure and non-random limit, which makes it 
hard to apply Ing's [10] approach to the non-stationary case. To obtain (3.6), extra effort 
is made to overcome the above difficulties; see Appendix B for details. For some other 
interesting analysis of APEs in various non-standard situations, see de Luna and Skouras 
[4] and Bercu [2]. 

Theorem 3.2. Let the assumptions of Theorem 3.1 hold. Then, for k>ph and h> 1, 

n-h ( /h-1 \ 2 ~j 

APED n , h (k) - vlh = I 2<7 2 I E b 3) + M k - 1) Mogn + o(logn) 

i=m h K \j=0 I ) 



= L2,h(k)logn + o(logn) a.s. 
Remark 3. As indicated in (B.35), 

n—h n — h ( /i — h ^ 

APEL>„./i(fc) - J2 th = (1 + o(l)) E <(QVi-h(k) 5>(fc)%/* 

i—m h i—nih K \j~k j 



(3.8) 



+ 0(1) a.s. 



While 



£ IxttQVi-Hik) E x j( fc )^ 

i=m h \ \j=k 

looks very similar to (3.7), Wei's approach for the one-step APE still cannot be applied 

to it because X^=fc x :;'(^) , 7i,fc>' 1 — ^> 1S n °t a martingale transformation. While Ing [10] 
resolved this difficulty in the stationary AR model, his method, which is highly reliant 
on the stationary assumption, is not applicable to the unit root processes. 



Remark 4- Theorems 2.2, 2.3, 3.1 and 3.2 together disclose a fascinating fact that the 
constants associated with the terms of order n _1 in MSPEP„ ^(£4) and MSPED,^^), 
with ki > pi and ki >Ph, are exactly the same as the constants associated with the 
terms of order logn in their corresponding multistep APEs. While MSPEP rij ;j(fci) and 
MSPEDn.h^) are unobservable, this special property allows us to preserve their asymp- 
totic rankings through the values of the associated multistep APEs, which can be easily 
obtained from the data. This is also the driving motivation for constructing (k n , j n ) in 
model (1.1). 
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Before showing the asymptotic efficiency of (k n , j„), we need to investigate the asymp- 
totic properties of APED nj fe(fc) in misspecified cases. 

Theorem 3.3. Let the assumptions of Theorem 3.1 hold. Then, for 1 < k < p^ and 

h>l, 

Iiminf -( APED„ h (k) - V rj 2 A >0 a.s. (3.9) 

n— >oo Ti \ — 4 J 

\ j = ™h / 

We are now in a position to state the main result of this section. 

Theorem 3.4. Let the assumptions of Theorem 3.1 hold. Then, for K>p\, (k ni j n ) is 
asymptotically efficient in the sense of (3.3). 

Remark 5. Since Ing [10] showed that (k n ,j n ) is also asymptotically efficient in sta- 
tionary AR models, Theorem 3.4, together with Ing's result, provides a unified approach 
for choosing the (asymptotically) optimal multistep predictor for AR processes with or 
without unit roots. While it is possible to select multistep predictors after unit root 
tests are performed (which means that the selection procedure will be carried out based 
on the differenced data if the unit-root hypothesis is not rejected), all unit root tests 
suffer from low power when the process is near unity. One can hardly expect a reliable 
selection/prediction result once the process is erroneously differenced. 

Before leaving this section, we note that to analyze the effect of the estimation of the 
mean into the performance of the predictors, one may consider a unit root AR model 
with drift, 

A(B)x t+1 =f3 + e t+1 , (3.10) 

where A(B) is defined after (1.1) and — oo < (3 < oo is some real number. In the case of 
h = 1 , we have obtained (through non-trivial modifications of the proofs of the results in 
Sections 2 and 3) that if (3 ^ 0, then for k>pi, 

lim n{E(x n+1 - x n+1 (k)) 2 - a 2 } = (k + 3)cr 2 , 

n — >oo 

and 

APEP n! i(fc)- £ t+i =cr 2 (fc + 3)logn + o(logn) a.s., 

i-mi 

where x n+ i(k) = x n+ i(k) = w n (/c)a„(l, k), with Wj(fc) = (ljX^fc)) and a 3 (l,fc) satisfy- 
ing ECfc w K fc ) w z( fc )}ai( 1 : fc ) = ECfc Wi(fc)x i+ i. Moreover, if (3 = 0, then for k >pi, 

lim n{E(x n+1 - x n+1 (k)) 2 - a 2 } = (k + 2)cr 2 , 
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and 

n-l 

APEP n ,i(fc)- Y 

e 2 l+1 = <j 2 (k + 2)\ogn + o(\ogn) a.s. 

i—mi 

As observed in the above four equalities, the correspondence between APE and MSPE 
remains valid under model (3.10), regardless of whether (3 — or not. Therefore, it is 
natural to conjecture that under model (3.10), (i) this correspondence can be extended 
to the case of ft, > 1; and (ii) Procedure I is still asymptotically efficient for multistcp 
prediction. However, we shall not pursue a proof of these conjectures here, since it goes 
beyond the scope of this paper. 

4. New criteria 

Although Theorem 3.4 shows that {k n ,j n ) is asymptotically efficient in the sense of (3.3), 
surprisingly, its finite sample performance is rather unsatisfactory. Simulation results 
show that the rankings of APEP ni h(fci) and APED„ j / t (fc2) are often inconsistent with 
the rankings Li^(k\) and L2,/i(fe) even when n > 500. One possible explanation for this 
phenomenon is as follows: In view of (3.4) and (3.5), for k\ >pi and fc 2 >Ph, 

APEP„j l (fc 1 )-APEAu i (fc 2 ) 

n— h 

= Y {x^4^i)(a*(l,fci)-a(fci))} 2 

i—nih 

(4.1) 

n—h n—h 

- Y {x , I (fc 2 )(a l (/ 1 ,fc 2 )-a D (ft,A :2 ))} 2 -2 ^ x^Z^^Xa^l, fei) - a(fci))Th, h 



i—m h 
n — h 



+ 2^ xUfc 2 )(a i (ft,fe) - a D (h,k 2 ))Vi,h = (I) - (II) - (HI) + (IV). 

i=m h 

While the cross-product terms, (III) and (IV), in (4.1) are almost surely of order o(logn) 
and asymptotically negligible compared to (I) and (II) (see Appendix B), we have found 
that the finite sample values of (III) and (IV) can differ remarkably. This "nonunifor- 
mity" feature causes "rank-distortion" when we perform cross-method comparisons. 

To overcome the above difficulty, we consider using PMIC nj h(fc) and DMIC„ ! / t (fc) to 
replace APEP„ .h(k) and APED„ ^(fc) in Procedure I, where 

PMlC„. h (k) 

(4-2) 

{/n-h \ /n-h \ _1 ~j 

(E *»(*W)J L ^(k) [Y x i( fc ) x i( fc )J L 'h,n(k)^ 2 n C n , 
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and 

{/n-h \ _1 /n-2h+~L \ ~) 

^x,(fc)x;(fc)J ^ ^(fc)z;.(fe)J Js*c n , (4.3) 

where lirrin^oo C„ = and liminfn^oo C n n/ log?i > 0. Note that &p n (h,k) = (n — 

h - K)~ 1 J2"=K{ x j+h - a n (h,k)xj(k)} 2 and a 2 Dn (h,k) = (n-h- K^YTj^K^j+h ~ 
a„(/i, fc)xj(fc)} 2 are the h-step residual mean squared errors obtained from the k- 
regressor plug-in and direct methods, respectively; a 2 = a 2 P n (l, K) = a 2 D n (l, if) is 
the one-step residual mean squared error obtained from the largest candidate model, 
z j( k ) = J2i=oKnXj+i{k), and L\ n (k) = Y^Zlbj^A^' 1 ' 1 (k), where & ,« = 1- and for 

3 > 1, ^n = ELi^-utt!.n(l,A'). witn C 1 1 A ") i ■ ■ ■ ! ~ a K,n (1 , #))' = a n (1 , AT) and 
a lin (l,K) = 0ifl>K. 

Here, we briefly describe some of the theoretical rationale behind this new criterion. 
Observe that 



(4.4) 




PMIC n , fe (fcx)-DMIC n , h (fc 2 ) 
= °p, n {h,k\) -a 2 D (h,k 2 ) 

'n-h \ /n-h \ _1 ~j 

j=fci / Vj=fei / J 

^ Xj (fc 2 )x;.(fc 2 ) Y, Mfel^iW )WnCn. 

j=k 2 / \ j=k 2 / ) 

It is shown in Appendix C that when k\ > pi and fc 2 >Ph, 

i-h \ /n-h \ _1 ~j 

X X j( fc l) X j( fc l) ]^,«(fcl) X x j( fc l) x j( fc l) ) ^,«( fc l) 

{/n-h \ _1 /n- 2 ft,+ l \ ~j 

^x J (fc 2 )x;(fc 2 )J ^ X z ,(fc 2 )z;(fc 2 )J|^ (4.5) 

= - L 2>h (k 2 ) + o(l) a.s. 

Therefore, the trace terms in (4.2) and (4.3) play roles in keeping the rankings of their 
corresponding loss functions. On the other hand, for k\ > pi and fc 2 >Ph, the weight 
associated with the trace terms, C„, asymptotically dominates &p n (h, k\) — <r|)(/i,fc 2 ) 
(see (C.2)), which helps to protect the trace term effects in (4.4) from being distorted 
by o'p n {h,ki) — 0£)(/i, k%). In fact, our simulations reveal that this domination usually 
occurs quite early (particular when C n is relatively large), and hence considerably allevi- 
ate the dilemma encountered by Procedure I in finite samples. (Note that o-p n (h, k) and 
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v 2 D n {h,k) cannot be dropped from (4.2) and (4.3) because they are necessary for pre- 
venting underspecification; see, e.g., (C.l).) The following is the new predictor selection 
procedure (which is referred to as Procedure II) and its asymptotic property. 

Step 1. Define 6^ = arg min DMIC„i(fc). 

l<k<K 

Step 2. Define 

6i' l) =arg min DMIC„, ?[ (fe) 

l<k<K 

and define 

6£' h >=arg min PMIC n>ft (fc). 

0^ } <k<K 

Step 3. If DMIC„, h (6l' l) ) > PMIC„^(6l 1 ' /l) ), then (6„, Af„) = (6 ( n' h) ,l); otherwise 
(6 n ,M n ) = (d ( n h \2). 

Theorem 4.1. Let the assumptions of Theorem 3.1 hold. Then, for K>p\, (0 n ,M n ) 
is asymptotically efficient in the sense of (3.3). 

Remark 6. Although (4.5) holds, it is worth mentioning that the trace terms in (4.5) 
are not consistent estimators of their corresponding loss functions Li^(fci) and £2,/i(&2); 
see (C.5) and (C.6) in Appendix C. 

Remark 7. Following an argument similar to that used in the proof of Theorem 4.1, it 
is not difficult to show that (O n ,M„) is also asymptotically efficient in stationary AR 
models. 

To illustrate the asymptotic results obtained in Theorem 4.1, we conduct a simulation 
study. The data generating processes (DGPs) are given by 

DGP I x t = -0.8x t ^2+e t , 

DGP II x t = 0.3x t -i - 0.8x t _ 2 + e u 

DGP III x t = 0.2x t _ 2 + 0.8x t _3 + £u 

DGP IV x t = 0.3.x t _i - Q.lxt-2 + 0-8x t -3 + s t , 

DGP V x t = 0.9x t _! - 0.81x t _ 2 + e u 

DGP VI x t = 0.6x t -i - 0.36a; t _2 + e u 

DGP VII x t = 0.9x t _i - 0.81x t „ 2 + 0.91x t _ 3 + e u 

DGP VIII x t = 0.9x t _i - 0.56x t _2 + 0.66a; t _3 + e u 

where et's are independent and identically Af(0, 25) distributed. We aim to select two-step 
(h = 2) predictors for DGPs I-IV and three-step (h = 3) predictors for DGPs V-VIII 
using Procedure II with C n = logn/n, 2 logn/n and 31ogn/n, which will be referred 
to as Procedures A, B and C, respectively. The candidate predictors are set to (i,j),i = 
1, . . . , 10 and j = 1,2. According to Section 2 and Section 2 of Ing [10], the asymptotically 
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Table 2. Order/method combination with the minimal loss function value 









h = 2 






h = 


3 




DGP 


I 


II 


III 


IV 


V 


VI 


VII 


VIII 


Combination 


(1,2) 


(2,11 


(2,2) 


(3,1) 


(1,2) 


(2,1) 


(2,2) 


(3,1) 



optimal multistep predictors (or the order/method combinations with the minimal loss 
function values) for DGPs I- VIII are listed in Table 2. We generated 1000 replications 
for each of these DGPs and carried out predictor selection for each replication. The 
frequency of these combinations selected by Procedures A, B and C is shown in Table 3 
for n = 150, 300, 500, 1000 and 2000. The simulation results are summarized as follows: 

(1) Two-step predictions. Procedures A, B and C can efficiently select the best or- 
der/method combination (listed in Table 2) regardless of whether the DGP is stationary 
or non-stationary. (Note that DGPs I and II are stationary, but DGPs III and IV are 
not.) In particular, the proportion of the best combination selected by Procedures B 
and C always exceeds 95 percent, except in DGPs II and IV with n = 150. Note that 
while the differences between the parameter values of DGPs I and II (or III and IV) are 
not sizable, different order/method combinations are required to attain the minimal loss 
function value (defined in (3.1) and (3.2)). Table 3 shows that these procedures are sensi- 
tive to small parameter changes and can efficiently switch to the "right track" . However, 
we also notice that the finite-sample performance of Procedure A seems to be slightly 
worse than that of Procedures B and C. 

(2) Three-step predictions. Note that DGPs V and VI are stationary AR(2) models 
with AR coefficients satisfying < a\ < 1 and a\ + 0,2 = 0. Ing [10] recently showed that 
(1,2) is asymptotically more efficient than (2, 1) in DGP V, whereas (2, 1) is asymptoti- 
cally more efficient than (1,2) in DGP VI. Procedures A, B and C perform quite well in 
this subtle case. More specifically, for (oj, a 2 ) = (0.9, —0.81), they can correctly choose (1, 
2) over 90 percent of the time for all sample sizes (except for Procedure A in the sample 
sizes of 150 and 300). On the other hand, when (ai, 02) = (0.6, —0.36), Procedures B and 
C successfully select another combination, (2,1), with rather high frequency for n > 300. 
While Procedure A performs slightly worse than the other two procedures, it can still 
choose (2,1) with over 89 percent frequency as n > 500. Data generating processes VII 
and VIII are unit root processes. In DGP VII, the direct method only requires two re- 
gressors to perform three-step predictions and, according to Section 2, (2, 2) can attain 
the minimal loss function value. On the other hand, (3, 1) is the best combination for 
DGP VIII. Table 3 shows that the performance of Procedures A, B and C in DGPs VII 
and VIII are similar to those in DGPs V and VI. 

To explore the finite-sample performance of these procedures for larger lead times, we 
also conduct a small Monte Carlo study using the following two unit root AR models: 



DGP IX x t = 0.2x t _io + 0.8.T t _n + e t , 
DGP X x t = 1.5x t _i - 0.5a; t _ 2 + £ t , 
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where £t's are independent and identically 7V(0,25) distributed. Our goal is to select 
ten-step (h = 10) predictors for these two DGPs among a family of predictors, = 
l,...,20,i=l,2}. Note that DGP IX is an AR(ll) model withpi = 2«pi = 11. The- 
orems 2.2 and 2.3 yield that the best combination for DGP IX is (2, 2). On the other 
hand, DGP X is an AR(2) model with pxo =Pi = 2 and, in view of Example 1, (2, 1) 
is the best combination for DGP X. Our simulation results, based on 1000 replications 
for n = 500 and 1000, are reported in Table 4. Table 4 shows that when h increases to 
10, Procedures A, B and C still work well, except in DGP X with n = 500. In this latter 
case, while the proposed procedures can choose the best combination 70-80 percent of 
the time, we have found that the proportion of (1, 2) chosen by them is about 20 percent, 
indicating an undcrfitting problem. However, this difficulty is alleviated as n increases 
to 1000, which coincides with the asymptotic results given in Theorem 4.1. 

Finally, we note that the choice of C n in Procedure II does influence its finite-sample 
results. While we do not intend to suggest the best C n in finite-sample cases, the C„'s 
used in this paper may serve as good "initial values" for pursuing better performance 
based on Procedure II. 

Table 3. Frequency of choosing predictors with minimal loss function values in 1000 replications 



h=2 h=3 



Procedure Procedure 



n 


Model(Unit Root) 


A 


B 


C 


Model(Unit Root) 


A 


B 


C 


150 


I (No) 


853 


963 


987 


V (No) 


882 


976 


993 


300 




890 


984 


997 




880 


974 


993 


500 




901 


990 


999 




913 


985 


997 


1000 




921 


990 


997 




918 


994 


999 


2000 




948 


992 


1000 




951 


991 


1000 


150 


II (No) 


817 


887 


869 


VI (No) 


698 


711 


689 


300 




845 


968 


983 




827 


936 


915 


500 




891 


980 


996 




898 


989 


992 


1000 




913 


985 


995 




913 


992 


1000 


2000 




923 


990 


999 




941 


997 


1000 


150 


III (Yes) 


844 


972 


991 


VII (Yes) 


841 


970 


993 


300 




893 


989 


997 




855 


978 


993 


500 




916 


992 


998 




911 


989 


998 


1000 




939 


993 


999 




917 


995 


999 


2000 




950 


997 


1000 




939 


997 


1000 


150 


IV (Yes) 


780 


894 


878 


VIII (Yes) 


633 


722 


705 


300 




881 


971 


995 




835 


901 


903 


500 




881 


973 


993 




888 


973 


975 


1000 




906 


980 


994 




930 


990 


996 


2000 




926 


989 


999 




944 


994 


1000 
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Table 4. Frequency of choosing predictors with minimal loss function values in 1000 replications 



h=VQ 



Procedure Procedure 



n 


Model (Unit Root) 


A 


B 


C 


Model (Unit Root) 


A 


B 


C 


500 


IX (Yes) 


967 


1000 


1000 


X (Yes) 


726 


787 


719 


1000 




981 


997 


1000 




808 


927 


936 



Appendix A 

Throughout this section, we only consider the case k > 2 (recall that k denotes the order 
of the working AR model) because the results for the case k = 1 can be verified similarly. 
We start with some useful lemmas. 

Lemma A.l. Assume that {xt} satisfies model (1.1) with {e t } obeying (2.1). Then, for 
any q > and k>pi, 

£||i?- 1 (fc)|| 9 = 0(l), (A.l) 

where R n (k) is defined after (2.2) and for a matrix A, \\A\\ 2 = sup|| z || =1 WA'Az with ||z|| 
denoting the Euclidean norm for vector z. 

Proof. (A.l) can be verified by an argument similar to that used in the proof of Lemma 
A.l in Ing et al. [13]. The details are omitted. □ 

Lemma A. 2. Assume that {xt} satisfies model (1.1) with {e t } obeying (2.1) and for 
some qi > 2, sup_ 00<t<00 E\e t \ 2qi < oo. Then, for any < q < q% andk>p+l, 



£7||^ 1 (A)-^" 1 (A)||« = 0(n-«/ 2 ) ) (A.2) 



where 



/r„(fe-i) o^_ x 



K(k) 



n-1 

E 

j=k 

■^n— 1 /j l\//7 1\ 7A7- V^vfc — 1 



V 



j=k 



T n (k - 1) = (1/n) YTj=k s j( k _ ^^(k - 1) and N j = x j - J2i=i a 3 x j-i- 
Proof. First note that Lemma A.l ensures for any q > 0, 

E\\RC(k)\\ q = 0(1). (A.3) 
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We also have 

\\R^Hk)-^~\m 9 <\\^Hm^~\mi^{k)-^{kw 



where C\ is some positive constant. By analogy with Lemma A. 3 in Ing et al. [13], 



n-l 



n-^JXfc-l)^ 

3=k 



(A.4) 



E 



n—l 



= 0(n-« 1 / 2 ). 



Consequently, (A. 2) follows from (A.l), (A.3)-(A.5) and Holder's inequality 



(A.5) 



□ 



To prove Theorem 2.2, we also need the following two lemmas, the proofs of which are 
straightforward and hence omitted. 

Lemma A. 3. Assume that {x t } satisfies model (1.1) with sup_ 00<t<00 -B|et| 9 < oo ; 
where q>2. Then, for k>pi, 



E 



j=k 



0(1). 



Lemma A.4. Assume that {xt} satisfies model (1.1) with sup. 
some r > 4. Then, for k>p\, 



lim E(F n>k ) = > 



(A.6) 

-oo<t«x>£i £ tr < °° f° r 

(A.7) 



where 



s n (k- l)M h {k- l)f" 1 (fc- l){E?=fe s i( fc - ife+il^EI^jEj+i 



(A. 



Proof of Theorem 2.2. Some algebraic manipulations give 

x n+h - x n+h (k) = r)n,h - x^(fc)L„ ! , l (fc)(a„(l, k) - a(fc)), (A. 9) 

where L n ^{k) is defined after (3.4). We also have 

nE{*! n {k){L n , h {k) - L, l (fc))(a„(l, k) - a(fc))} 2 

= E^ n {k){L n>h (k)-L h {k))D' n {k^ (A.10) 

= E{G 2 n (k)}, 
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where Lh(k) = YljZo bjA' 1 ^ 1 ~ J (fc), with A(k) defined in Section 1. Let a n (k — 1) = 
(d(n, 1), . . • , a(n, k — 1))' = \I>(fe)a n (l, k), where ^(k) is a (k — 1) X k matrix, with the 
(i,j)th component equal to if j < i and equal to —1 if j > i. Then, by observing 
A(k)D' n (k) = D' n (k)A(k) and A n (k)D' n (k) = D' n (k)A*(k), where D n (k) is D n (k) with a* 
replaced by a(n, i) for i = l,...,k — l, 



A(k) = 



(S M {k-l) fc _! 



and A* n (k) is with 5j\/(fc - 1) replaced by 



SM,n(k - 1) = a n (k - 1) 



h- 



o; 



fe-2 



we have 



and 



where 



L h (k)D' n (k)=D' n (k)Z h (k) 
L n , h (k)D' n (k)=D' n (k)L* nth (k), 



L h (k) 



/Mh(fc-l) fe _! 



V 




(A.11) 
(A.12) 



and L; ih (fc) is L h (A) with M h (k - 1) replaced by M n , h (k - 1) - Ej=o M£r7n~ J '(* ~ X )- 
(A.ll) and (A.12) yield 

(L nih (k) - L h (k))D' n (k) 
= L n . h (k)(D' n (k) - D'Jk)) + (D' n (k) - D' n (k))L* n>h (k) 
+D' n (k)(L* n>h {k)-L h (k)), 

and hence 

\G n (k)\<G* n (k), (A. 13) 

where G* (k) = (I) + (II), with 

(/) = lln-VVMIHIMfc - 1) - «( fc - l)ll(ll^n,fc(fe)|| + H^hWIDGI,™^), 
(//) = (||s„(fc - 1)|| + \n-V a N n \)\\L* h (k) - L h {k)\\G\Jk) 
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and Gl n (k) = ll-R-^^HIIn-^^^E^feXjCfc^+ill.By (A.10), (A.13), Lemmas A.l- 
A.3 and Holder's inequality, it can be shown that 

nE{x' n {k){L n , h {k) - L h {k)){k n {l,k) - a(fc))} 2 < E{G* n (k)f = O^- 1 ). (A.14) 

Similarly, we have 

E^(k)L h (k)iy n (k)fa\k)-^ =0(n- 1 ). (A.15) 

By (A. 11) and some algebraic manipulations, 

(A.16) 

= E 1 , n (k)+E 2 , n {k) + E3, n (k), 

where 

Ex, n {k) = E\s' n (k - l)M h (k - l)f-\k - l)n- 1/2 Y, Sj(* - tyj+i } , 



j=k 



B 3 ,n(*) = 2^6 i j^(F n>fc ). 

By an analogy with Theorem 1 of Ing [9] , 

lim Ei, n (fc)=/i, h (fc-l). (A.17) 
In view of Ing [8], it is straightforward to show that 



lim E 2 , l (k) = 2a 2 \ V ^- ] 

\j=0 I 



(A.18) 



Consequently, the desired result follows from (A. 9), (A.14)-(A.18) and Lemma A. 4. □ 

Proof of Theorem 2.3. By analogies with lemmas A.1-A.4, for k >ph, 

E\\R-X(kW = 0(1), (A.19) 
E\\R-\(k)-R^(k)\\ 4 = 0(n- 2 ), (A.20) 
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E 



n-X 



and 



where q > 0, 



j=k 



lim E(F n , k ) = 0, 



0(1) 



(A.21) 



(A.22) 



/ 1 n ~ h \ 

1 - V Sj -(fc-l)s$(fc-l) fe _! 

if-, ( J J 



3=k 



OJt-1 



^ n — ft. 

n 2 ^-f 



and 



F 



71, fc 



Sn(fc -l){(l/n) E"=fc s #~ !) s j( fc - 1)} 1 {E^k S )( fc ~ 1 fe} J: «E"i I 3^ 



(A.23) 



In addition, according to (1.9) and (3.5) of Ing and Sin [12], it can be shown that 



lim E{ n 

n — >oo 



Z^=fc x j \j=Q 



(A.24) 



As a result, Theorem 2.3 follows from (A.19)-(A.22), (A.24) and arguments similar to 
those used in the proofs of Theorem 2 in Ing [9] and Theorem 2.2 above. □ 



Appendix B 

Lemma B.l below provides (almost sure) asymptotic bounds for ||r„(fc — 1) — T(k — 1)|| , 
\\R n {k) — Rn (fc)|| and ||^~ 1 (fc)|| under a minimal moment condition, sup_ 00<t<00 F^l" 
for some a > 2. As will be seen later, these bounds play subtle roles in our asymptotic 
analysis. 

Lemma B.l. Assume that the assumptions of Theorem 3.1 hold. Then, 
(i) for k>2, k >pi, and some i > 0, 

||f„(fc-l)-r(fc-l)|| =o(n-') a.s.; (B.l) 
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(ii) for k > pi and some r\ > 0, 

\\K%-Rl{k)\\=o{n-^) a.s. 

(iii) for k>pi, 

\\fc l (k)\\=OQoglogn) a.s. 

Proof. First note that 
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(B.2) 



(B.3) 



k-2 k-2 



|f n (fe-i)-r(fc-i)||<x;E 



1=0 m=0 



n-1 

n" 1 ^2 Sj-iSj- m - 7j,. 

= k 



where -fi iTn is the (Z,m)th component of r(fc — 1). Therefore, (B.l) is ensured by showing 
that for any 1 < I < k — 1 and 1 < m < k — 1, 



n— 1 

n ^ 

j=k 



j—l Sj— m il,m 



■ o(n L ) a.s. 



(B.4) 



In the following, we only prove the case of I = m = since the proofs of other cases can 
be similarly obtained. For I = m = 0, the left-hand side of (B.4) can be rewritten as 



n-l 



j=k 



j=k 



(B.5) 



where 7qq = c 2 X)r=o c r w ^ tn c i' s defined in Section 1. By observing 70,0 = c 2 ^^ c^ 



and \c r \ < Cic /3ir for all r and some C\,f3\ > 0, we have (1/n) X}j=fc (^0,0 ~ 7o,o 
0(l/n) and kjo^/n = 0(l/n). In addition, straightforward calculations yield that 



j 3 h-i 

s ) - To,l = E c ?-'( £ ? _ ff2 ) + 2 E E c j-ii c j-i2eii£i2- 

i=l i 2 =2ii=l 

In view of (B.6), one obtains, through changing the order of summations, that 

"2 „2 „,C?) "l / "2 c 2 



(B.6) 



E 



7 

J=l \j=«i J / i=ni+l \j= 



+ 2 E E E 



c j-h c j-h 



l 2 =2 Ui=l V=ni 
ri2 f £2 — 1 / '^2 



+ 2 E iElE^" 2 

i 2 =ni + l Ui=l V=i 2 



1^ e /2 ^ (i) + (/7) + (//7) + (/V), 
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where r) t = ef — a 2 , 9 < 1 and 9a/ 2 > 1. If we can show that for any 1 < n\ < n-2 < oo, 



/ » 2 1 \ & 
E|(G)r/ 2 <C(^^ 



(B.7) 



(where G = I, II, III and IF) and C > 0, £i > > 1 are some positive constant inde- 
pendent of ill and tt-2 (but they can vary with G), then by Moricz [18] for all sufficiently 
large ri\, 



E max 

n\ <l<ri2 



1 s 2 - -Y (i) 
*- '0,0 



>/2 



< 



.r 



(B. 



where C* > 0, £* > 1 and £| > 1 are some positive constants independent of n\ and n^. 
(B.8) and Kronecker's lemma yield 



1 n 



(B.9) 



i=i 



As a result, (B.l) holds with t = 1 — 9. 

Without loss of generality, assume 2 < a < 4. Then, 



a/4 



l 



<^E E ^74^71 Ei^-'^-«i a/a ^i a/a 

n 2 -l 



— ^ 3 ( E „-6»q/2 E .e«/4 E -Sa/A 
h=m 3 1 32=31+1 32 



(h-jiY 



(B.10) 



\3=m 



< 



< 



° 4 ( E 7^72 ) 



where Ci > 0, i = 2, . . . , 4, and s > 1 are some positive constants independent of n\ and 
^2, 1 < £,i < Oa/2, £2 = 9a/2£i, the first inequality follows from Burkholder's inequality, 
the second one follows from the fact that a/4 < 1 and changing the order of summations, 
the third one is ensured by sup t -E|e t | Q < oo and Cj < GiC~' 9lJ , which implies for all 
n\ < ji ^ 32 < n 2 , YaLi \ c ji-i c 32-i\ a/2 < C 5 |ji - j 2 |~ s , for some G 5 > 0. As a result, 
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(B.7) holds with G = I. The proof of (B.7) for the case of G = II is similar. The details 
are omitted. To show (B.7) for the case of G = III, note that 



E 



ni ( (2 — 1 / n>2 

£ £ £ 

Z 2 =2 Ui=l \i=nj 



c j-h c j-h 



\e h >e k 



< 



/ m (h-i / "2 

* £ £ £ 

. \i 2 =2 Ui=l \J=ni 

/ ™1 ^2 — 1 / ™2 



c j-h c j-h 



c j-h c j-h 



x/2 
2v a/4 



2 s a/4 



(B.ll) 



= M" ££ £^-. , ■ 

By arguments similar to those used to verify the second to fifth inequalities in (B.10), the 
desired result follows. Similarly, it can be shown that (B.7) holds for the case of G = IV. 
To show (B.2), first observe that 



fc-2 



\R n (k)-R* n (k)\\<V2Y, 



1=0 



n-l 



7372 £ s J-l N i 



j=k 



Therefore, it suffices to show that for I = 0, . . . , k — 2 and some r\ > 0, 

_^ n — 1 

-372 ^2^-1^=°^) a - S - 



(B.12) 



J= k 



We only verify (B.12) for the case I = since the proof of the case / > can be similarly 
obtained. Let max{l, (1/2) + (2/a)} < 6\ < 3/2. Some algebraic manipulations yield 



£^ = - 2 £i£^ + ££^+ £ £ 



"2 H 3 



ri2 «2 



7 

j—ni m— 1 

ni m — 1 ri2 



m— 1 j—n\ 



m— ni + 1 j—m 



£££%^ + £ ££¥* 



ri2 m— 1 ?i2 



m— 2 /—I j—n-\ 
ni ?— 1 n; 



m-rii + 1 / — 1 j—m 
ri2 i— 1 n; 



(B.13) 



£ £ £ t?t £ »^ + £ £ £ %r £ ™ £ ' 

J=2m=lj'=ni ■ y i=m + l m=l j=2 J 

(J) + (77) + (777) + (IV) + ( V ) + ( VI) + ( V77) 



It is clear that 



ifti a/a W£ir> 



(B-14) 
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where £1 = 9\ and £2 = a/2. By an argument similar to that used in (B.10), 

/ n 2 \6 

E\{W)\^<C 7 [ £ — , (B.15) 

\i=ni ^ / 

where W = 77, 1/7, 1 < £1 < 6\a/2 and £ 2 = #ia/2£i. An argument similar to that used 
in (B.ll) yields 



E\{W)\^<cJ f] , (B.16) 

where W = /V", f, V7, VIZ, 1 < 6 < (20i - l)a/4 and £ 2 = (26»i - l)a/4^ x . Consequently, 
(B.12) (with 77 = (3/2) - 6»i) follows from (B.13)-(B.16), Moricz [18] and Kronecker's 
lemma. To show (B.3), observe that Hi?" 1 ^)!! < \\Rn l {k)\\\\Rn{k) - i?* -1 (jfe)|| + 

||-R* _1 (fc)||. By (3.23) of Lai and Wei [14] and (3.2) of Lai and Wei [15], 

|K"(fc)||=0(loglogn) a.s. 
This and (B.2) yield (B.3). □ 

To prove Theorem 3.1, the following auxiliary lemma is required. Its proof can be 
found in Appendix B of Ing et al. [11]. 

Lemma B.2. Assume that the assumptions of Theorem 3.1 hold. Then, for k > 
max{2,pi} ; 

n — h 

£i^ fe =o(n) a.s., (B.17) 

i=m h 

where Fi.k is defined in Lemma A. 4-. 

We also need a few elementary facts. 

Lemma B.3. Let {z n } be a sequence of real numbers. 

(i) If z n > 0, n _1 y*7j—-\ Zj = 0(1) and, for some £ > 1, liminfn^oo v n /n^ > 0, then 



(ii) If n~ x y~]j—i zj — o(l), then 



1 V i 

3 = 1 
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Proof of Theorem 3.1. We only prove the case k > 2 since the proof of the case k = 1 
is similar. By Chow [1] and an analogy with (3.8) of Ing [10], 



n — h 



n—h 



APEP n>h {k)- Y,M 2 = {<(k)k h (k)(a l (l,k)- a (k))} 2 (l + o(l)) 

i=m h i=m h 

+ 0(1) a.s. 

Straightforward calculations give 

n—h 



(B.18) 



= E - xKfc)(L 4 ,,(fc)-L, 1 (fc))^Kfc)^r 1 (fc)^=A(fc)E x ^ fc )^+i 

i=m h (. V j=k 

By Lai and Wei [14] and (3.1) and (3.2) of Lai and Wei [15], we have 



(B.19) 



\& n (k-l)-a(k- 1)|| = 



\\L* h (k)-L h (k)\\ = 



logn 



logn 



1/2 



1/2 



= 0(1) a.s., 
Mfc)/VH|| = 0((loglogn) 1 / 2 ) 



(B.20) 

(B.21) 

(B.22) 
(B.23) 



In addition, by Lemma 1 of Wei [21], the law of the iterated logarithm, and (3.3) of Lai 
and Wei [15], 



^ n—X 

-=D n (k)y2xj(k)e j+1 



= o((log7i)' 5 (loglogn) 1/2 ) a.s., (B.24) 



where 5> 1/a. As a result, by (A.ll), (A.12), (B.3), (B.19)-(B.24) and the fact that 
A^„/v / ^= 0((loglogn) 1 / 2 ) a.s., one obtains 



i—h 



Y,W l (k)(L l Ak)-L h (k))(a l (l,k)-a(k))} 2 = 0(l) a.s. (B.25) 



Armed with (B.2), (B.3) and the fact that (k)\\ = O(loglogn) a.s. (which is given 

after (B.16)), it can be shown that 



Wr-W-k (k)\\=o 



(log log nf 



(B.26) 
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where 77 > is some positive constant. Since (A. 11) yields for some C\ > 0, \\Di(k)L' h (k) x 
Xi(fc)|| - \\L' h {k)D t {k)^{k)\\ < Ci(||si(A! - 1)|| + |JViM|), we obtain 

n—h ( i— 1 ~) 2 



< 



n— « -j 



WR-W-Rf (fc)|| 



(B.27) 



1 i_1 



0(1) a.s., 



where C2 > is some positive constant independent of n and the equality follows from 
(B.24), (B.26), 7V n /^ =0 ((loglogn) 1 /2) a . s ., (1/n) || Si (fc - 1)|| = 0(1) a.s. and 

(i) of Lemma B.3. 

By (A. 11) and some algebraic manipulations. 



n—h 



i-1 



i=m h 

where 



Y jk^Lii^w^'wiAw^i^i+i =(/)+(//) +(///), 



n—hf 1 i— 1 

(/)= ^ <(fc-l)A4.(fc-l)f- 1 (fc-l)-^s,(fc-l) £j+1 



/fc-1 



("HEM E 



0=0 / i^m^ 
/h—X \ ri~h 



m=2 E*i Ef 

\j'=0 / i=mfe 

According to (B.21) and analogies with (A.l) and Theorem 3.1 of Ing [10], 

(/)= Y, \<(k- 1 )Mi,h(k-l)Ti 1 (k-l) 1 Y,s j (k-l)e j+1 \ + o(logn) a.s. 



j=k 



= /i,h(fc-l)logn + o(log?i) a.s. 
By Theorem 4 of Wei [21], 

A-i \ 2 



(JI) = 2 6j j cr 2 log n + o (log n) a.s. 
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In view of Lemma B.2 and (ii) of Lemma B.3, one obtains 

(177) = o(logra) a.s. 

As a result, 



]T - x' i (fc)^(fc)^(fc)Ar"(fc)- 7 =A(fc)^x,(fc) £j+1 

m h (. V j=k 

= | 2 (E 6 ^ ^ + - 1)| log" + °( lo S 



(B.28) 



n) a.s. 



Consequently, (3.6) follows from (B.18), (B.25), (B.27), (B.28) and the Cauchy-Schwarz 
inequality. □ 

To analyze APED Ut h(k), Lemma B.4 is required. 

Lemma B.4. Let the assumptions of Theorem 3.1 hold. Then, 

n—h _2fV" , »— ft 



e ,r '^, r:r = 2 (em ^n+o^n) a ,. ^ 

i=m h \l^j=k X j) \j=0 ) 

Proof. Following arguments similar to those used in the proofs of Lemma 2 and Theorem 
1 of Ing and Sin [12], one obtains 



liminf loglogny o >Q ag (B30) 

n — >oo XI * J 

3=1 

and 

x n = 0((nloglogn) 1/2 ) a.s. (B.31) 
By the Borel-Cantelli lemma, 

e„=o(n 1/2 ) a.s. (B.32) 
In addition, it is not difficult to show that for 9 > 1/2 and I > 1, 

^ n—l 

~9 ^ £ 3 £ j+l =°( 1 ) a ' S - ( B - 33 ) 

(B.30)-(B.33) together imply 

/^i-h 2 \2 [ rj j 2-^ rv* -1 t 2 "! 2 ^ ' a ' s ' ^ ' ' 

i=m h \l^]=k- L ]) \j=0 / i=m h KA^j=k Jj j> 
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Consequently, (B.29) follows from (B.34) and the fact that 



n—h _2/V^j— 1 



E 



which is guaranteed by (2.15) of Ing and Sin [12]. 



2er 2 logn + o(logn) a.s. 



□ 



Proof of Theorem 3.2. We only prove the case of k > 2 since the proof of the case of 
k = 1 is similar. By the same reasoning as in (B.18), we have 



i — h 



APE£» n , h (fc) - Y, r$,h = (l + o(l)) E { <(m-h{k) Y x ^o,h 



\ J= k 



+ 0(1) a.s. 



(B.35) 



Observe that 



n—h 



'i-h 



n—h 1 ( / i—h N 



(B.36) 



According to (B.30), (B.31) and arguments similar to those used to obtain (B.24) and 
(B.26), 



j=fe 



o((logn) 5 (loglogn) 1/2 ) 



and 



P„ » - Ku, Wll = 0(n-"(loglogn) 2 ) 



where 5 > 1/a and i] > 0. These facts and reasoning similar to that used in (B.27) yield 



n—h 1 ( / m i—h 

Y - x' l (fc)^(fc)(i?^ 1 (fc) - Rh!(k)) 7 =5 i (C x i(%i,k 



j=k 



0(1) a.s. 



(B.37) 



Now, 



n — h 



i-h 



(I) + (//) + (III), 



(B.38) 
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i—h 



(J)= 



i—h 



£ Si (*-i)s5(*-i) 



j=fc 



/TT , ST^ X i(J2j=k X 3 7 l3,h) 2 



i=m h \l-ij=k I 



,2\2 



and 



m = E 



Fi(k) 



with Fi{k) defined in (A. 23). By analogy with Theorem 3.2 of Ing [10], 

(I) = }2,h{k - 1) logn + o(logn) a.s. 
According to Lemma B.4. 



/h-i \ 2 

(77) = 2cr 2 V" bj logn + o(logn) a.s. 



(B.39) 



(B.40) 



By reasoning similar to that used in the proof of Lemma B.2 (see Appendix B of Ing 
et al. [11]), 



n — h 



^] Fi(k) = o(n) a.s., 



J=m h 

and hence 

(777) = o(logn) a.s. 
Consequently, (3.8) follows from (B.35)-(B.41). 



(B.41) 

□ 



To prove Theorem 3.3 we need a technical lemma, the proof of which can also be found 
in Appendix B of Ing et al. [11]. 

Lemma B.5. Let the assumptions of Theorem 3.1 hold. Then, for 1 < k < ph and h>l, 

n—h 



K(A0fa(M)-S(M))] =o(n) 



a.s., 



(B.42) 



434 C.-K. Ing, J.-L. Lin and S.-H. Yu 

where B.(h, k) = 1, if 1 = k < p^, and 

a(/ l) fc) = C/ fex(fc _ 1) |^ah_ J (fc-l)| + (l,0,...,0) , J (B.43) 

if 1 < k < ph, where C/fcx(fe-i) = ( u ij) * s a k x (k — 1) matrix, with Uij = 1, if i = j , 
uij = — 1, if i — j = 1 and Uij = 0, otherwise, and ai(k — 1) = lim^oo aj*' ) (fc — 1), with 

otf\k - 1) = arg min E(s t +i-fis t fk-\s t -k+2) 2 - 

Uu-Jk-iYeR"- 1 

Proof of Theorem 3.3. By an analogy with (B.35), 

APKD n , h (A) 

n — h 



= £ {m,h + ^(P+l)«h,p+l)-a t (h,k))Y 



i=m h 
n — h 



(B.44) 



= £ <h + (1 + o(l)) £ {xj(p+ l)(a(fc,p + l) - a* (ft, A:))} 2 

i=m h i=m h 

+ 0(1) a.s., 

where the a 2 (ft,,fc) in (B.44) is viewed as a (p + l)-dimensional vector with undefined 
entries set to 0. Direct calculations yield 



n— h 



£ {x:(p+l)(a(/i,p+l)-a 4 (/.,fc))} 2 

i=roi 

= (a(ft,p + 1) - a(ft, fc))V„_ h (fc)(a(ft,p + 1) - a(ft, fc)) 

(B.45) 

n— h 

- 2 £ x£(p + l)(a(ft,p + 1) - a(ft, k))^(p + l)(ki(h, k) - a(ft, k)) 

i=m h 
n — h 

+ £ [xJ(fc)(fi<(M)-a(M))] 2 , 



where the a(ft, fc) in the first two terms on the right-hand side of (B.45) is viewed as a 
(p + l)-dimensional vector with undefined entries set to 0. By (3.2) of Lai and Wei [15], 



liminfn" 1 K_ /l (A;) >0 a.s. (B.46) 

n — ► oo 
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Consequently, (3.9) follows from (B.44)-(B.46), (B.42), the Cauchy-Schwarz inequality 
and the fact that &(h,p+ 1) — &(h, k) ^ 0. □ 

Proof of Theorem 3.4. Since /2,i(& — 1) = (k — 1)<7 2 , Theorems 3.2 and 3.3 imply 

P(h£] n = Pl , eventually) = 1 . (B.47) 

Applying (B.47) and Theorems 3.1-3.3, Theorem 3.4 follows. □ 

Appendix C 

In this Appendix, we sketch the proof of Theorem 4.1. Applying an argument used in 
the proof of Theorem 3.5 in Wei [22], it can be shown that for k <ph, 

]hD.m£dj Jn (h,k)-aj Jn (h,Ph)>0 a.s. (C.l) 

Armed with the probability results obtained in Appendix B, one obtains for k\ > p\ and 

ki >Ph, 

\a 2 Pin (h,k 1 )-& 2 Din (h,k 2 )\=o(logn/n) a.s., (C.2) 
\a 2 p n (h,k 1 )-d- 2 Pn {h,pi)\=o{\ogn/n) a.s., (C.3) 
\°D, n (thfo) - °D,n( h >Ph)\ =o(logn/n) a.s. (C.4) 
In addition, it can be shown that for k\ >pi and ki >Ph, 



+ -1) >C„ + o(C„) a.s. 



(C.5) 



and 



{/n-h \ _i Ai-2/i+l \ "| 



'h-l 



2 



(C.6) 



= j^E>j +hh(k2-l)jC n + o(C n ) a.s. 
Consequently, the asymptotic efficiency of (0„,M„) follows from (C.1)-(C.6). 



436 



C.-K. Ing, J.-L. Lin and S.-H. Yu 



References 

[1] Chow, Y.S. (1965). Local convergence of martingale and the law of large numbers. Ann. 

Math. Statist. 36 552-558. MR0182040 
[2] Bercu, B. (2004). On the convergence of moments in the almost sure central limit theorem 

for martingales with statistical applications. Stochastic Process. Appl. Ill 157-173. 

MR2049573 

[3] Chevillon, G. and Hendry, D.F. (2005). Non-parametric direct multi-step estimation for 

forecasting economic processes. Int. J. Forecast. 21 201-218. 
[4] De Luna, X. and Skouras, K. (2003). Choosing a model selection strategy. Scand. J. Statist. 

30 113-128. MR1963896 
[5] Findley, D.F. (1983). On the use of multiple models for multi-period forecasting. In Proc. 

Bus. Econom. Statist. Sec. 528-531. Amer. Statist. Assoc., Alexandria, VA. 
[6] Findley, D.F. (1984). On some ambiguities associated with the fitting of ARMA models to 

time series. J. Time Ser. Anal. 5 213-225. MR0782076 
[7] Fuller, W.A. and Hasza, D.P. (1981). Properties of predictors for autoregressive time series. 

J. Amer. Statist. Assoc. 76 155-161. MR0608187 
[8] Ing, OK. (2001). A note on mean-squared prediction errors of the least squares predictors 

in random walk models. J. Time Ser. Anal. 22 711-724. MR1867394 
[9] Ing, OK. (2003). Multistep prediction in autoregressive processes. Econometric Theory 19 

254-279. MR1966030 

[10] Ing, OK. (2004). Selecting optimal multistep predictors for autoregressive processes of 

unknown order. Ann. Statist. 32 693-722. MR2060174 
[11] Ing, OK., Lin, J.L. and Yu, S.H. (2008). Toward optimal multistep forecasts in nonstation- 

ary autoregressions. Technical report, Inst. Statist. Sci., Academia Sinica. 
[12] Ing, OK. and Sin, OY. (2006). On prediction errors in regression models with nonstationary 

regressors. In Time Series and Related Topics: In Memory of Ching-Zong Wei. IMS 

Lecture Notes Monogr. Ser. 52 60-71. Beachwood, OH: Inst. Math. Statist. 
[13] Ing, O.K., Sin, OY. and Yu, S.H. (2009). Prediction errors in nonstationary autoregressions 

of infinite order. Econometric Theory. To appear. 
[14] Lai, T.L. and Wei, OZ. (1982). Least squares estimates in stochastic regression models with 

application to identification and control systems. Ann. Statist. 10 154-166. MR0642726 
[15] Lai, T.L. and Wei, OZ. (1983). Asymptotic properties of general autoregressive models 

and strong consistency of least squares estimates of their applications. J. Multivariate 

Anal. 13 1-23. MR0695924 
[16] Lin, J.L. and Tsay, R.S. (1996). Co-integration constraint and forecasting: An empirical 

examination. J. Appl. Econometrics 11 519-538. 
[17] Lin, J.L. and Wei, OZ. (2006). Forecasting unstable processes. In Time Series and Related 

Topics: In Memory of Ching-Zong Wei. IMS Lecture Notes Monogr. Ser. 52 72-92. 

Beachwood, OH: Inst. Math. Statist. 
[18] Moricz, F. (1976). Moment inequalities and the strong laws of large numbers. Z. Wahrsch. 

Verw. Gebiete 35 299-314. MR0407950 
[19] Rissanen, J. (1986). Order estimation by accumulated prediction errors. J. Appl. Probab. 

23A 55-61. In Essays in Time Series and Allied Processes (J. Gani and M.P. Priestley, 

eds.) MR0803162 

[20] Tiao, G.C. and Tsay, R.O (1994). Some advances in no-linear and adaptive modelling in 
time-series. J. Forecast. 13 109-131. 



Forecasts non- stationary autoregressions 



437 



[21] Wei, C.Z. (1987). Adaptive prediction by least squares predictors in stochastic regression 

models with application to time series. Ann. Statist. 15 1667-1682. MR0913581 
[22] Wei, C.Z. (1992). On predictive least squares principles. Ann. Statist. 20 1-42. MR1150333 

Received December 2007 and revised August 2008 



